The neurochip TOTEM: a case study in HEP. 
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Abstract 



^1 It is being proved that the neurochip Totem is a viable solution for high quality 

OO ■ and real time computational tasks in HEP, including event classification, triggering 

and signal processing. The architecture of the chip is based on a "derivative free" 
ly^ I algorithm called Reactive Tabu Search (RTS), highly performing even for low pre- 

■ cision weights. ISA, VME or PCI boards integrate the chip as a coprocessor in a 

host computer. This paper presents: 1) the state of the art and the next evolution 
of the design of Totem; 2) its ability in the Higgs search at LHC as an example. 
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5r , 1 Introduction 



Neural networks implemented as VLSI hardware are being considered as good 
candidates to solve problems of time-critical and high quality performance 
pattern recognition in High Energy Physics (HEP) [1-3]. The main benefit is 
speed, because of the massive parallel architecture. A cost is usually a very 
complex architectural structure, since common algorithms such as backprop- 
agation, being derivative-based, require high precision computation [4]. 
To gain significant improvement in this respect, Battiti and TecchioUi de- 
vised a "derivative-free" algorithm in the context of a novel approach to the 
training problem, which is first transformed into a combinatorial optimiza- 
tion task, then solved by means of the heuristic method called Reactive Tabu 
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Search (RTS)[7,8]. 

RTS is based on the construction of search trajectories in the space of the 
binary strings of length L = N * B, into which N weights, needed to configure 
a neural network, are suitably coded using B bits per weight. The search is 
intended to locate the best "suboptimal" minimum on a cost surface by means 
of a sequence of elementary moves, each consisting of a single bit-flip in the 
string of weights. When a move is done, its inverse is forbidden for a pro- 
hibition period of T successive steps (the Glover's Tabu Search method[10]), 
allowing some amount of diversification in the search process. RTS remark- 
ably enhances such diversification by dynamically adjusting the parameter T 
through a simple mechanism that evaluates and reacts to the current local 
shape of the cost surface. This way it escapes rapidly from local minima and 
cyclings and finds solutions even for low precision weights, moreover quite in- 
dependently from any starting point. 

Sect. 2 is devoted to a description of the neurochip Totem; Sect. 3 presents 

a new architectural design; Sect. 4 gives the results of a sample application, 
namely the extraction of the Higgs events from background in simulated data 
at LHC energies. 

2 The Totem chip 

Totem is a full-custom chip designed to operate as a co-processor in a host 
system, carrying out the most compute-intensive operations for RTS [9]. ISA 
and high performance PCI and VME boards have been developed to set the 
coupling. The chip includes an array of 32 parallel processors with associated 
on-chip weight memory and control logic with broadcast and output buses. 
Pipelinings are included to speed up operations. A 32-bit static storage regis- 
ter on the output of the MACs allows data transfer from the neurons of a layer 
in a MLP net to occur concurrently with a parallel input-multiply-accumulatc 
operation on all the processors. The memory depth of 128 8-bit words allows 
neurons with up to 128 inputs to be implemented. Because of the sequential 
access to the weights, the chip can realize different MLP topologies with a 
high degree of flexibility: the memory bank can either be assigned to a sin- 
gle neuron or be partitioned among neurons on different layers. The sigmoid 
function is implemented off-chip by a RAM-based look-up table. Up to four 
chips can be paralleled in each layer of a network. 

With 250,000 transistors on a 70 mm^ die manifactured in a 1.2 fim CMOS 
technology, TOTEM performs 1 GMAC/sec when clocked at 30 MHz. 
A doubling in the processor density and higher operational speed will be ob- 
tained by the transition to a 0.8 fim CMOS technology currently in progress. 

3 Advances in the design of Totem and the plog encoding 

Considerable percentage of the silicon real estate of the TOTEM chip had to 
be devoted to the multiplication units and to the memories where the weights 
are stored. Both areas will be reduced by means of the already mentioned 
technological migration. However in the case of the multipliers an alternative 
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and complementary approach can be considered: going to a logarithmic repre- 
sentafion of the feature inputs and the axon weights, so that muhiphcation is 
replaced by addition, which is cheaper in terms of silicon area requirements and 
allows both lower power consumption and faster clock rate. Some of the au- 
thors (P. Lee, I. Lazzizzera, A. Sartori, G. TecchioUi and A. Zorat) are explor- 
ing this way indeed [5]. The first problem is to find a reasonable approximation 
to the bin-to-log and log-to-bin conversion, since they are quite expensive[6]. 
If one defines for any positive real number x the functions r]{x) = n G N such 
that X e [2", 2'*+^[ and plog{x) = r]{x)-\-x/2'^^^'> — 1, one gets an approximation 
of log2{x) that has a maximum error of only 0.0861. When x — Z^J^o ^ ^j^' is 
a binary encoded positive number with W = 2"^ bits, then evidently hi = 
for 14^ — 1 > i > r]{x) and bri(x) = 1- Writing plog{x) = Y^iS}jPi2\ it follows 
that the bits Pw-i,Pw-2, • • • ?Po are the binary encoding of ri{x) and the bits 
p_i,p_2, ■■■,P-f are given by b^{x)-2: ■ ■ ■ K{x)-f respectively. Clearly / 

is an integer parameter stating a truncation (quantization error) . 
This way one gets the basis of a sign- magnitude, fixed-point plog- encoding 
of an integer x G [—2^ — 1, 2^ — 1] with 1 + w + f bits: 1 bit (given by 
hw) for the sign; w bits encoding 'q{x) (the integer part); / bits (given by 
b'n{x)-ibri{x)-2 ■ ■ ■ br](x)-f) for the fractional part (x = is coded in a particular 
way). The total error includes the quantization (~ 2~f) and the approxima- 
tion to log2 (< 0.0861): it amounts at most to a 10%. When applying the 
plog encoding to neural nets, the multiplier stage of a neuron is replaced by 
an adder and a plog-to-bin unit. In such a plog based architecture the RTS 
training method for a Multi Layer Perceptron (MLP) is applied without mod- 
ifications. The above estimated error of at most 10% turns out to be the same 
that Totem owes to a 4-bit weight setup in the conventional multiplier ar- 
chitecture: with such low precision weights, however, adequate solutions for 
many problems [8] are still obtained. The point is that, assuming the same 
fabrication technology, the area of the multiplication blocks are reduced by 
a factor 10, with a reduction in power consumption by a factor 12 and an 
increase in computational speed by a factor 3. The reduction in power con- 
sumption can be exploited to increase both the number of processors per die 
and the operating speed. These figures pave the way to new implementations 
with high performance factors at the same fabrication costs. As an example 
the fabrication of a neural processor hosting hundreds of neurons running at 
a reasonable 100 MHz clock rate is feasible within a couple of years. With 
such a processor, triggering tasks requiring neural nets of approximately 100 
neurons can easily substain input rates of the order of 10^ events per second, 
thus making its use possible even in the most time critical experiments, such 
as LHC. 

4 Higgs search: observables and performance of TOTEM 

Totem has been tested in the discrimination of Higgs events from background 
at LHC energies using simulation data obtained by the PYTHIA/JETSET 
Monte Carlo code. 
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Arbitrarily we assume the Higgs mass to be Mh = 200 GeVj(?^ just above 
the threshold for the creation of two real Z's[12]. In this case the dominant 
production mechanism is the gluon-gluon fusion and the best decay channel 
for its identification is the so-called gold plated channel: 

pp^ HX ^ ZZ ^ A/iX. 
whose cross section is 2.84 x 10~^^ mb as computed by the Pythia MC code. 
We provide the two expected main backgrounds according to the actual top 
quark mass (Mj = 175 GeV[14]): 

p p ^ t t X 11^ X' 

with 4 muons produced by semileptonic decays of the top and antitop; 

pp^ Z^ hh X ^ 11+ II- 12+ 12- X' 
with a muon pair produced by Z° decay and the other one by semileptonic 
b and b decays. These two backgrounds have cross-sections respectively of 
7.84 X 10~^ mb and 6 x 10~^ mb as computed by the Pythia MC code. 
We order the final muons according to the magnitude of their transverse mo- 
menta and use the following ten variables as physical observables : 
(Xi — X4) the transverse momentum of the four muons; 
(X5 — Xg) the invariant masses of the four 11+ /J,- pairs; 
(Xg) the four muons invariant mass; 

(Xiq) the hadron multiplicity of the hard jets, discriminated according to the 
K± Clustering algorithm for hadron-hadron collisions [13]. 

Totem has been trained using a sample of 4000 Higgs events, mixed with 
2000 of each of the backgrounds. The test set, totally different from the train- 
ing one, consisted of Nh = 2000 Higgs events mixed with about 360,000 ti and 
270,000 Zbb event samples (thus respecting only the ratio between the cross 
sections of the two backgrounds). Some results are listed in Table 1, where 
Nfj is the number the events correctly classified as Higgs, A^^ is the number of 
the events wrongly classified as Higgs and 6 is the interval amplitude within 
which the classification of an Higgs is assumed certainly correct, in units cor- 
responding to 1/8192 of the gap between the truth value of an Higgs event 
and the truth value of a background event. Efficiency (N^/Nh) and purity 
{■^hH-^h + ^b)) ^1^*^ shown, linearly extrapolated to a number of back- 
ground events in a ratio with 2000 Higgs events as required by the exitimated 
cross sections above. 
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extrap. pur. 
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753 


31 


0.37 


0.61 


2 


1102 


55 


0.55 


0.56 


5 


1228 


95 


0.61 


0.45 


10 


1498 


172 


0.74 


0.36 


100 


1863 


848 


0.93 


0.12 



Table 1 
References 



4 



[I] B. Denby, The use of Neural Networks in High Energy Physics, in Neural 
Computation 4(5) 1976 

[2] R.K. Boch, 1. Carter and L.C. Legrand, ATLAS/DAQ-No-11 EAST 94-08, CERN 
(1994) 

[3] Th. Linblad et al., Nucl. Instrum. Methods, 356 (1995) 498. 

[4] R. Battiti and G. Tecchiolli, Learning with first, second, and no derivatives: A 
case study in high energy physics. Neurocomputing 6 (1994) 181-206. 

[5] P. Lee, L Lazzizzera, A. Sartori, G. TecchiolH and A. Zorat Nuclear Instruments 
& Methods in Physics Research A in print. 

[6] R. De Mori, R. Cardin, A new design approach to binary logarithm computation. 
Signal Processing, 13(2), Sept. 1987, pp. 177-195. 

[7] R. Battiti and G. Tecchiolli, The Reactive Tabu Search, ORSA Journal on 
Computing, 6 (2) (1994) 126-140. 

[8] R. Battiti and G. Tecchiolli, Training Neural Nets with the Reactive Tabu Search, 
Tech. Rep. UTM 421, Dip. di Matematica, Univ. di Trento - Italy. Shorter version 
to appear in IEEE Transactions on Neural Networks 

[9] G.Anzellotti et al., J. of Mod. Phys.C, 6 (1995) 555-560 

[10] F. Glover, ORSA Journal on Computing, 1(3), pp.190-206 (1989) 

[II] C. R. Baugh and B. A. Wooley, A two's Complement Parallel Array 
Multiplication AlgorithmlEEE Transactions on Computers C-22 (12) 1045-1047. 

[12] M. Liischer, R Weisz Nucl. Phys. B290 (1987) 5; ibid. B295 (1988) 65; ibid. 
B318 (1989) 705 

[13] S.Catani, Y.L.Dokshitzer, M.H.Seymour, B.R.Weber, Nucl. Phys. B406 (1993) 
187 

[14] M. Mangano and T. Trippe (of the Particle Data Group), in Review of Particle 
Poperties, Phys. Rev. D54 (1996) 309. 



5 



